Noname  manuscript  No. 

(will  be  inserted  by  the  editor) 


Community  Detection  in  Multi-Dimensional  Networks 


Lei  Tang  •  Xufei  Wang  •  Huan  Liu 


Received:  date  /  Accepted:  date 


Abstract  The  pervasiveness  of  Web  2.0  and  social  networking  sites  has  en¬ 
abled  people  to  interact  with  each  other  easily  through  various  social  media. 
For  instance,  popular  sites  like  Del.icio.us,  Flickr,  and  YouTube  allow  users 
to  comment  on  shared  content  (bookmarks,  photos,  videos),  and  users  can 
tag  their  favorite  content.  Users  can  also  connect  with  one  another,  and  sub¬ 
scribe  to  or  become  a  fan  or  a  follower  of  others.  These  diverse  activities  result 
in  a  multi- dimensional  network  among  actors,  forming  group  structures  with 
group  members  sharing  similar  interests  or  affiliations.  This  work  systemati¬ 
cally  addresses  two  challenges.  First,  it  is  challenging  to  effectively  integrate 
interactions  over  multiple  dimensions  to  discover  hidden  community  structures 
shared  by  heterogeneous  interactions.  We  show  that  representative  community 
detection  methods  for  single-dimensional  networks  can  be  presented  in  a  uni¬ 
fied  view.  Based  on  this  unified  view,  we  present  and  analyze  four  possible 
integration  strategies  to  extend  community  detection  from  single-dimensional 
to  multi-dimensional  networks.  In  particular,  we  propose  a  novel  integration 
scheme  based  on  structural  features.  Another  challenge  is  the  evaluation  of  dif¬ 
ferent  methods  without  ground  truth  information  about  community  member¬ 
ship.  We  employ  a  novel  cross-dimension  network  validation  procedure  to  com¬ 
pare  the  performance  of  different  methods.  We  use  synthetic  data  to  deepen 
our  understanding,  and  real-world  data  to  compare  integration  strategies  as 
well  as  baseline  methods  in  a  large  scale.  We  study  further  the  computational 
time  of  different  methods,  normalization  effect  during  integration,  sensitiv¬ 
ity  to  related  parameters,  and  alternative  community  detection  methods  for 
integration. 
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1  Introduction 

The  recent  boom  of  social  media  (e.g.,  Del.icio.us,  Flickr,  YouTube,  Face- 
book,  MySpace  and  Twitter)  permits  human  interaction  with  unprecedented 
convenience.  With  widely-available  large-scale  networks  in  social  media,  social 
network  analysis  is  gaining  increasing  attention  from  a  variety  of  disciplines  in¬ 
cluding  computer  science,  physics,  economics,  epidemiology,  business  market¬ 
ing,  and  behavioral  science.  One  fundamental  task  is  to  find  cohesive  subgroups 
(a.k.a.  communities)  whose  members  interact  more  frequently  with  each  other 
than  with  those  outside  the  group  [41].  The  extracted  communities  can  be 
utilized  for  further  analysis  such  as  visualization  [17],  viral  marketing  [29], 
determining  the  causal  factors  of  group  formation  [3],  detecting  group  evolu¬ 
tion  [28]  or  stable  clusters  [4],  relational  learning  [33],  and  building  ontology 
for  semantic  web  [23,31,20]. 

A  plethora  of  approaches  have  been  proposed  to  address  community  de¬ 
tection  with  network  data.  However,  most  existing  work  focuses  on  only  one 
dimension  of  interaction  among  people  (i.e.,  a  network  comprised  of  interac¬ 
tions  of  a  single  type).  In  reality,  people  interact  with  each  other  in  assorted 
forms  of  activities,  leading  to  multiple  networks  among  the  same  set  of  ac¬ 
tors,  or  a  multi- dimensional  network 1  with  each  dimension  representing  one 
type  of  interaction.  In  the  physical  world,  people  interact  with  others  in  a 
variety  of  ways,  e.g.,  face-to-face,  by  email  or  by  phone;  The  same  is  true  in 
cyberspace  as  shown  in  Figure  1.  For  instance,  at  popular  photo  and  video 
sharing  sites  (Flickr  and  YouTube) ,  a  user  can  connect  to  his  friends  through 
email  invitations  or  the  provided  “add  as  contacts”  function.  Users  can  also 
tag/comment  on  shared  content  such  as  photos  and  videos.  A  user  on  YouTube 
can  upload  a  video  to  respond  to  a  video  posted  by  another  user,  and  can  also 
become  a  fan  of  another  user  by  “subscribing”  to  the  user’s  content.  Net¬ 
works  can  be  constructed  based  on  each  form  of  activity.  By  combining  them 
together,  we  obtain  a  multi-dimensional  network  representing  the  richness  of 
user  interaction.  More  generally,  people  can  be  active  at  multiple  different  so¬ 
cial  networking  sites.  It  is  common  for  one  user  to  be  registered  on  several 
social  networking  sites  at  the  same  time,  e.g.,  Facebook,  Twitter,  BlogSpot, 
YouTube,  and  Del.icio.us.  In  such  cases,  a  multi-dimensional  network  can  be 
constructed  with  each  dimension  representing  user  interaction  at  each  site. 

For  a  multi-dimensional  network  with  heterogeneous  interactions,  one  type 
of  interaction  might  be  insufficient  to  determine  group  membership  accurately. 
In  social  media,  a  certain  type  of  interaction  can  be  incomplete  due  to  users’ 
privacy  concern.  The  interactions  can  also  be  noisy  since  it  is  much  easier 
to  connect  with  another  user  online  than  in  the  physical  world.  Indeed,  some 


1  Some  researchers  also  use  the  paraphrase  multi-relational  network.  In  social  science, 
multi-relational  network  tends  to  refer  to  the  case  that  multiple  different  relations  exist  be¬ 
tween  two  actors.  While  in  computer  science  domain,  multi-relational  network  tends  to  refer 
a  network  with  heterogeneous  entities  interacting  with  each  other,  which  actually  corre¬ 
sponds  to  a  multi-mode  network  [35].  Here,  we  use  multi-dimensional  network  to  emphasize 
that  actors  are  involved  in  disparate  interactions. 
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Fig.  1:  Multi-Dimensional  Network 


online  users  have  thousands  of  online  friends  whereas  this  could  hardly  be  true 
in  reality.  For  instance,  one  user  in  Flickr  has  more  than  19,000  friends.  For 
this  kind  of  user,  it  is  really  fuzzy  to  mine  the  community  he/she  is  involved 
in  using  the  friendship  network  alone.  On  the  other  hand,  many  users  in  the 
network  might  have  only  one  or  two  friends.  With  these  noisy  and  highly 
imbalanced  interactions,  relying  on  one  type  of  interaction  alone  might  miss 
the  true  user  community  structure. 

Integrating  assorted  forms  of  interaction  can  compensate  for  incomplete 
information  in  each  dimension  as  well  as  reduce  the  noise  and  obtain  a  more 
reliable  community  structure.  Some  users  might  be  reluctant  to  add  friends, 
but  frequently  engage  in  another  activity  such  as  uploading  videos  or  com¬ 
menting  on  other  videos.  The  interactions  at  different  dimensions  all  indicate 
user  interests.  Hence,  one  might  infer  a  more  accurate  community  structure  by 
integrating  disparate  interactions.  However,  idiosyncratic  personalities  lead  to 
varied  local  correlations  between  dimensions.  Some  people  interact  with  group 
members  consistently  in  one  form  of  activity,  but  infrequently  in  another.  It 
thus  becomes  a  challenge  to  identify  groups  in  multi-dimensional  networks 
because  we  have  to  fuse  the  information  from  all  dimensions  for  integrated 
analysis. 

In  this  work,  we  first  present  representative  approaches  of  community  de¬ 
tection  with  a  unified  view.  Based  on  this  unified  view,  we  discuss  poten¬ 
tial  extensions  of  community  detection  in  one-dimensional  (1-D)  networks  to 
multi-dimensional  (M-D)  networks.  We  present  four  integration  strategies  in 
terms  of  network  interactions,  utility  functions,  structural  features  and  com¬ 
munity  partitions,  respectively.  Their  pros  and  cons  are  discussed  in  detail. 
Typically,  a  real-world  network  does  not  have  full  information  about  group 
membership.  Hence,  a  novel  cross-dimension  network  validation  procedure  is 
proposed  to  compare  the  communities  obtained  from  different  approaches.  We 
establish  the  veracity  of  this  evaluation  scheme  based  on  synthetic  data  with 
known  community  structure,  and  then  apply  it  to  a  real-world  network  data 
to  systematically  compare  different  integration  strategies. 


2  Community  Detection  in  1-D  Networks 

In  this  section,  we  review  existing  representative  methods  for  community  de¬ 
tection  in  one-dimensional  networks,  and  then  present  a  unified  view  of  these 
methods,  preparing  their  extension  to  multi-dimensional  networks. 
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Let  G(V,  E)  denote  a  network  with  V  the  set  of  n  vertices  and  E  the  m 
edges,  and  A  £  {0,  l}nxrl  denote  the  adjacency  matrix  (network  interactions). 
The  degree  of  node  i  is  e?;.  Ai;]  =  1  if  there  is  an  edge  between  nodes  i  and  j. 
Unless  specified  explicitly,  we  assume  the  network  is  undirected.  A  community 
is  defined  as  a  group  of  actors  with  frequent  interactions  occurring  between 
them.  Community  detection  attempts  to  uncover  the  community  membership 
of  each  actor.  In  particular,  the  problem  is  defined  below: 

Community  Detection:  Given  a  network  A  £  {0,  l}nx™  with  n  being 
the  number  of  actors,  and  k  the  number  of  communities  in  the  network, 
community  detection  aims  to  determine  the  community  assignment  of 
each  actor.  The  community  assignment  is  denoted  as  H  £  {0,  l}nxfe 
with 

„  _  J  1,  if  actor  i  belongs  to  community  j  , . 

^  (0,  otherwise 

In  this  work,  we  study  the  case  each  actor  belongs  to  only  one  community. 
That  is,  Hj  j  =  1.  To  resolve  the  community  detection  problem,  various 

approaches  have  been  developed  including  latent  space  models,  block  model 
approximation,  spectral  clustering  and  modularity  maximization.  Below,  we 
briefly  review  these  representative  methods  and  show  that  they  can  be  inter¬ 
preted  in  a  unified  view. 


2.1  Latent  Space  Models 

A  latent  space  model  maps  the  nodes  in  a  network  into  a  low-dimensional 
Euclidean  space  such  that  the  proximity  between  the  nodes  based  on  network 
connectivity  are  kept  in  the  new  space,  then  the  nodes  are  clustered  in  the 
low-dimensional  space  using  methods  like  k- means  [43].  One  representative 
approach  is  multi- dimensional  scaling  (MDS)  [6].  Typically,  MDS  requires  the 
input  of  a  proximity  matrix  P  £  M"x™,  with  each  entry  Pl:l  denoting  the 
distance  between  a  pair  of  nodes  i  and  j  in  the  network.  For  a  network,  a 
commonly  used  proximity  measure  is  geodesic  distance  [41],  i.e. ,  the  length  of 
the  shortest  path  between  two  nodes.  Let  S  £  R"xi  denote  the  coordinates  of 
nodes  in  the  f-dimensional  space  such  that  S  are  column  orthogonal.  It  can 
be  shown  [6,30]  that 

SST  w  -4(/-  -llT)(PoP)(J-  -11T)  =  P  (2) 

2  n  n 

where  /  is  the  identity  matrix,  1  an  n-dimensional  column  vector  with  each 
entry  being  1,  and  o  the  element-wise  matrix  multiplication.  It  follows  that 
S  can  be  obtained  via  minimizing  the  discrepancy  between  P  and  SST  as 
follows: 

min  IIS'/S'1,  —  P\\%  (3) 

Suppose  V  are  the  top  £  eigenvectors  of  P  with  largest  eigenvalues,  A  a  diagonal 
matrix  of  top  £  eigenvalues  A  =  diag(X±,  X2,  •  ■  ■ ,  Xe).  The  optimal  S  is  S  = 
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Fig.  2:  Basic  Idea  of  Block  Model  Approximation 


VAi .  Note  that  this  multi-dimensional  scaling  corresponds  to  an  eigenvector 
problem  of  matrix  P.  Then  classical  fc-means  algorithm  can  be  applied  to  find 
community  partitions. 


2.2  Block  Model  Approximation 

Block  model  approximation  is  to  approximate  a  given  network  by  a  block 
structure.  The  basic  idea  can  be  visualized  in  Figure  2  where  the  left  graph 
shows  a  network  and  the  right  one  is  the  block  structure  after  we  reorder 
the  index  of  actors  according  to  their  community  membership.  Each  block 
represents  one  community.  Therefore,  we  approximate  the  network  interaction 
A  as  follows: 

A  «  SESt  (4) 

where  S  £  {0,  l}raxfc  is  the  block  indicator  matrix,  E  the  block  (group)  inter¬ 
action  density,  and  k  the  number  of  blocks.  A  natural  objective  is  to  minimize 
the  following  formula: 

min  ||  A  —  SSST\\'^  (5) 

The  discreteness  of  S  makes  the  problem  NP-hard.  We  can  relax  S  to  be  con¬ 
tinuous  but  satisfy  certain  orthogonal  constraints,  i.e.,  STS  =  Ik,  then  the 
optimal  S  corresponds  to  the  top  k  eigenvectors  of  A  with  maximum  eigenval¬ 
ues.  Similar  to  the  latent  space  model,  fc-means  clustering  can  be  applied  to 
S  to  recover  the  community  partition  H. 


2.3  Spectral  Clustering 

Spectral  clustering  [40]  derives  from  the  problem  of  graph  partition.  Graph 
partition  aims  to  find  out  a  partition  such  that  the  cut  (the  total  number  of 
edges  between  two  disjoint  sets  of  nodes)  is  minimized.  Though  this  cut  min¬ 
imization  can  be  solved  efficiently,  it  often  returns  trivial  and  non-interesting 
singletons,  i.e.,  a  community  consisting  of  only  one  node.  Therefore,  practi¬ 
tioners  modify  the  objective  function  so  that  the  group  size  of  communities  is 
considered.  Two  commonly  used  variants  are  Ratio  Cut  and  Normalized  Cut. 
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Suppose  we  partition  the  nodes  of  a  network  into  k  non-overlapping  commu¬ 
nities  7 r  =  (Ci,  C2,  •  •  • ,  Cfc),  then 

Ratio  Cut(7r  )  =  'ECUti%fi) 

Ci 

Z=1  1  1 

Normalized  Cut(7r)  = 

where  Ci  is  the  complement  of  Cj,  and  vol{Ci)  =  Ylv^c-  Both  objectives 
attempt  to  minimize  the  number  of  edges  between  communities,  yet  avoid  the 
bias  of  trivial-size  communities  like  singletons.  Both  can  be  formulated  as  a 
min-trace  problem  like  below 

min  Tr(STLS)  (8) 

se{o,i}'iXfc 

with  L  (graph  Laplacian)  defined  as  follows: 

D  —  A  (Ratio  Cut)  ,  , 

I  —  D~1/2AD~1/2  (Normalized  Cut) 

Akin  to  block  model  approximation,  we  solve  the  following  spectral  clustering 
problem  based  on  a  relaxation  to  S. 

mjn  Tr(STLS)  s.t.STS  =  Ik  (10) 

Then,  S  corresponds  to  the  top  eigenvectors  of  L  with  smallest  eigenvalues. 


(6) 

(7) 


2.4  Modularity  Maximization 

Modularity  [26]  is  proposed  specifically  to  measure  the  strength  of  a  com¬ 
munity  partition  for  real-world  networks  by  taking  into  account  the  degree 
distribution  of  nodes.  Given  a  random  network  with  n  nodes  and  m  edges, 
the  expected  number  of  edges  between  node  1  and  j  is  did j /2m  where  d,;  and 
dj  are  the  degrees  of  node  i  and  j,  respectively.  So  A,;;  —  did j /2m  measures 
how  far  the  network  interaction  between  nodes  i  and  j  ( Aij )  deviates  from 
the  expected  random  connections.  Given  a  group  of  nodes  C,  the  strength  of 
community  effect  is  defined  as 

Aij  —  didj/2m. 

ieC,j&C 

If  a  network  is  partitioned  into  multiple  groups,  the  overall  community  effect 
can  be  summed  up  as  follows: 

E  E  A^  —  didj/2m. 

c  iecjec 
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Fig.  3:  A  Unified  View  of  Representative  Community  Detection  Methods 


Modularity  is  defined  as 

Q  =  q—  y  ]  y  1  Aij  —  didj/2m.  (11) 

c  iecjec 

where  the  coefficient  1/2 m  is  introduced  to  normalize  the  value  between  -1 
and  1.  Modularity  calibrates  the  quality  of  community  partitions  thus  can  be 
used  as  an  objective  measure  to  optimize. 

Let  B  =  A  —  ,  s c  G  {0,1}"  be  the  community  indicator  of  group  C, 

and  S  the  community  indicator  matrix,  it  follows  that 

0  =  2^  E  sCBsC  =  2^ Tr(STBS )  =  Tr(STBS)  (12) 

where 

1  A  ddT  ,  N 

B=^B=~ - 7— — .  (13) 

2  to  2  m  (2m)2 

With  a  spectral  relaxation  to  allow  S  to  be  continuous,  the  optimal  S  can  be 
computed  as  the  top-fc  eigenvectors  of  matrix  B  [25]  with  maximum  eigenval¬ 
ues. 


2.5  A  Unified  View 


In  the  previous  subsections,  we  briefly  present  four  representative  community 
detection  methods:  latent  space  models,  block  model  approximation,  spectral 
clustering  and  modularity  maximization.  Interestingly,  all  these  methods  can 
be  unified  in  a  process  as  in  Figure  3.  The  process  is  composed  of  4  components 
with  3  intermediate  steps.  Given  a  network,  a  utility  matrix  is  constructed. 
Depending  on  the  objective  function,  different  utility  matrices  can  be  con¬ 
structed. 


Utility  Matrix  M 


P  in  Eq.  (2)  (latent  space  models) 

A  in  Eq.  (4)  (block  model  approximation) 
L  in  Eq.  (9)  (spectral  clustering) 

B  in  Eq.  (13)  (modularity  maximization) 


(14) 


After  obtaining  the  utility  matrix,  we  obtain  the  structural  features  S  via 
the  top  eigenvectors  with  largest  (or  smallest  subject  to  formulation)  eigen¬ 
values.  The  selected  eigenvectors  capture  the  prominent  interaction  patterns, 
representing  approximate  community  partitions.  This  step  can  also  be  consid¬ 
ered  as  a  de-noising  process  since  we  only  keep  those  top  eigenvectors  that 
are  indicative  of  community  structures.  To  recover  the  discrete  partition  H , 
a  fc-means  clustering  algorithm  is  applied.  Note  that  all  the  aforementioned 
approaches  differ  subtly  by  constructing  different  utility  matrices. 

The  community  detection  methods  presented  above,  except  the  latent  space 
model,  are  normally  applicable  to  most  medium-size  networks  (say,  100,000 
nodes).  The  latent  space  model  requires  an  input  of  a  proximity  matrix  of 
the  geodesic  distance  of  any  pair  of  nodes,  which  costs  0(n3)  to  compute  the 
pairwise  shortest  path  distances.  Moreover,  the  utility  matrix  of  the  latent 
space  model  is  neither  sparse  nor  structured,  leading  to  0(n3)  to  compute  its 
eigenvectors.  This  high  computational  cost  hinders  its  application  to  real-world 
large-scale  networks. 

On  the  contrary,  the  other  methods,  block  model  approximation,  spectral 
clustering,  and  modularity  maximization,  construct  a  sparse  or  structured  (a 
sparse  matrix  plus  low  rank  update)  utility  matrix,  whose  computational  cost 
is  almost  negligible2.  Asymptotically,  the  cost  to  construct  a  utility  matrix  is 

Tuuuty  =  0(m).  (15) 

Implicitly  Restarted  Lanczos  method  (IRLM)  can  be  applied  to  compute  the 
top  eigenvectors  efficiently  [10,42],  Let  £  denote  the  number  of  structural  fea¬ 
tures  to  extract.  If  one  makes  the  conservative  assumption  that  0(£)  extra 
Lanczos  steps  be  involved,  IRLM  has  the  worst  time  complexity  of 

Teig  =  0(h(m£  +  n£2  +  £3))  (16) 

where  /i,  ?n  and  n  are  the  number  of  iterations,  the  number  of  edges  and 
nodes  in  the  network,  respectively.  Typically,  to  ~  0(n)  in  a  social  network 
with  power  law  distribution  [34]  and  £  «  n.  In  practice,  the  computation 
tends  to  be  linear  with  respect  to  n  if  £  is  small.  The  post-processing  step 
to  extract  community  partition  relies  on  k-means  clustering,  which  has  time 
complexity 

Tkmeans  =  0(nk£e)  (17) 

where  £  is  the  number  of  structural  features,  e  is  the  number  of  iterations. 

In  summary,  the  representative  community  detection  methods  can  be  uni¬ 
fied  in  the  same  process.  The  only  difference  is  how  to  construct  the  utility 
matrix.  This  also  affects  the  time  complexity  of  different  methods.  Block  model 
approximation,  spectral  clustering,  and  modularity  maximization  share  sim¬ 
ilar  time  complexity,  which  can  be  solved  efficiently.  With  this  unified  view, 
we  can  systematically  study  different  strategies  to  handle  multi-dimensional 
networks. 

2  The  utility  matrix  of  modularity  maximization  is  dense  but  structured,  thus  it  is  rarely 
computed  out.  Its  structure  is  exploited  directly  for  eigenvector  computation  [25,36]. 
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3  Community  Detection  in  M-D  Networks 

In  the  previous  section,  we  reviewed  various  methods  of  community  detection 
in  1-D  networks  and  presented  a  unified  view.  Here,  we  systematically  study 
possible  strategies  to  extend  community  detection  from  1-D  networks  to  M-D 
networks.  Before  we  proceed,  we  state  the  problem  of  M-D  network  community 
detection  first.  A  d-dimensional  network  is  represented  as 

A=  {A(1\A(2\---,A(d)} 

with  represents  the  interaction  among  actors  in  the  *-th  dimension  satis¬ 
fying 

#er+x",  aw  =  (a«)t,  i  =  1,2, ■■■,<* 

where  n  is  the  total  number  of  actors  involved  in  the  network.  Here,  we  concen¬ 
trate  on  symmetric  networks3.  In  a  multi-dimensional  network,  the  interactions 
of  actors  are  represented  in  various  forms.  In  certain  scenarios,  a  latent  com¬ 
munity  structure  exists  among  actors,  which  explains  these  interactions.  The 
goal  of  this  work  is  to  infer  the  shared  latent  community  structure  among  the 
actors  given  a  multi- dimensional  network.  In  particular,  we  attempt  to  find 
out  a  community  assignment  such  that  a  utility  measure  (e.g.,  block  model 
approximation  error,  modularity)  is  optimized  for  each  dimension. 

In  order  to  find  out  the  shared  community  structure  across  multiple  net¬ 
work  dimensions,  we  have  to  integrate  the  information  from  all  dimensions. 
Since  four  components  (network,  utility  matrix,  structural  features  and  parti¬ 
tion)  are  involved  throughout  the  community  detection  process  (Figure  3),  we 
can  conduct  the  integration  in  terms  of  each  component  as  in  Figure  4.  In  par¬ 
ticular,  we  have  Network  Integration,  Utility  Integration,  Feature  Integration, 
and  Partition  Integration.  Below,  we  delineate  each  type  of  integration  strategy 
in  detail.  We  use  modularity  maximization  as  an  example  to  go  through  all  the 
different  strategies4.  The  derivation  of  other  variants  of  community  detection 
methods  (such  as  block  models  and  spectral  clustering)  in  multi-dimensional 
networks  following  the  unified  view  should  be  straightforward. 


3.1  Network  Integration 

A  simple  strategy  to  handle  a  multi-dimensional  network  is  to  treat  it  as 
single-dimensional.  One  straightforward  approach  is  to  calculate  the  average 
interaction  network  among  social  actors: 

A=\Y,A{i)  (18) 

U  i= 1 

3  Directed  networks  can  be  converted  into  undirected  networks  through  certain  operations 
as  shown  later. 

4  A  preliminary  work  based  on  modularity  maximization  is  published  in  [36].  This 
manuscript,  significantly  different  the  previous  conference  version,  presents  a  general  frame¬ 
work  to  interpret  community  detection  in  multi-dimensional  networks. 
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Network 

Integration 


Utility 

Integration 


Feature 

Integration 


Partition 

Integration 


Fig.  4:  Potential  Multi-Dimensional  Integration  Strategies 


Correspondingly, 


TO 


: 

d 


1  d 

a- 


(i) 


i— 1 


(19) 


With  A,  this  boils  down  to  classical  community  detection  in  a  single-dimensional 
network.  Based  on  the  average  network,  we  can  follow  the  community  detec¬ 
tion  process  as  stated  in  the  unified  view.  Take  modularity  maximization  as 
an  example,  we  can  maximize  the  modularity  as  follows: 


max  — —XV 
s  2  TO 


(20) 


3.2  Utility  Integration 


Another  variant  for  integration  is  to  combine  utility  matrices  instead  of  net¬ 
works.  We  can  obtain  an  average  utility  matrix  as  follows: 


M  =  -  V  M(l 

r\  ^ 


where  M ^  denotes  the  utility  matrix  constructed  in  the  i-th  dimension.  The 
community  indicators  can  be  computed  via  the  top  eigenvectors  of  the  utility 
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matrix.  This  is  equivalent  to  optimizing  the  objective  function  over  all  the  di¬ 
mensions  simultaneously.  As  for  modularity  maximization,  the  average  utility 
matrix  in  this  case  would  be 


r\  rl 


i=l 


d«(d«)T'l 
(2ml*))2  J 


(21) 


Finding  out  the  top  eigenvectors  of  the  average  utility  matrix  is  equivalent  to 
maximizing  the  average  modularity  as  follows: 


1 


max  -  Tr(STB^S)  =  ma xTr{STMS) 


(22) 


i= 1 


3.3  Feature  Integration 

We  can  also  perform  the  integration  over  the  structural  features  extracted  from 
each  dimension  of  the  network.  One  might  conjecture  that  we  can  perform 
similar  operations  as  we  did  for  network  interactions  and  utility  matrices,  i.e., 
taking  the  average  of  structural  features  as  follows: 

1  . d 

S  =  -  S(i)  (23) 

i= 1 

Unfortunately,  this  straightforward  extension  does  not  apply  to  structural  fea¬ 
tures.  Because  the  solution  S  which  optimizes  the  utility  function  is  not  unique. 
Dissimilar  structural  features  do  not  suggest  that  the  corresponding  latent 
community  structures  are  drastically  different.  For  example,  let  S  be  the  top-£ 
eigenvectors  that  maximize  modularity  Q ,  and  V  an  orthonormal  matrix  such 
that 

V  G  R/x/,  VVT  =  It,  VTV  =  ft 
It  can  be  verified  that  SV  also  maximizes  Q: 

^ tr((SV)TB(SV ))  =  tr(STBSVVT )  =  ^ tr(STBS )  =  Qmax 

Essentially,  SV  and  S  are  equivalent  under  an  orthogonal  transformation.  In 
the  simplest  case,  S'  =  —S  is  also  a  valid  solution.  Averaging  these  structural 
features  does  not  result  in  sensible  features. 

Alternatively,  we  expect  the  structural  features  of  different  dimensions  to 
be  highly  correlated  after  certain  transformations.  To  capture  the  correlations 
between  multiple  sets  of  variables,  (generalized)  canonical  correlation  analy¬ 
sis  (CCA)  [14,18]  is  the  standard  statistical  technique.  CCA  attempts  to  find  a 
transformation  for  each  set  of  variables  such  that  the  pairwise  correlations  are 
maximized.  Here  we  briefly  illustrate  one  scheme  of  generalized  CCA  which 
turns  out  to  equal  to  principal  component  analysis  (PCA)  in  our  specific  case. 
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Let  SW  g  Knx^  denote  the  structural  features  extracted  from  the  z-th 
dimension  of  the  network,  and  Wi  £  be  the  linear  transformation  applied 
to  structural  features  of  dimension  i.  The  correlation  between  two  sets  of 
structural  features  after  transformation  is 

=  w;T  ((Sw)T,S(j))  wj  =  wiTC'y  Wj 

with  Cij  =  (S«)TS«  representing  the  covariance  between  the  structural 
features  of  the  z-th  and  the  j-th  dimensions.  Generalized  CCA  attempts  to 
maximize  the  summation  of  pairwise  correlations  as  in  the  following  form: 

d  d 

max  EE  Wi  CijWj 

*=i  j=i 

d 

s.t.  y  WiTC'ijWi  =  1 

issti 

Using  standard  Lagrange  multiplier  and  setting  the  derivatives  respect  to  Wj 
to  zero,  we  obtain  the  equation  below: 

C-\  -|  C\ 2  •  •  •  Cld  Wi  C'i  -|  0  •  •  •  0  wi 

C21  C 22  •  •  •  C'ld  w 2  0  C22  •  •  •  0  w 2 

...  =  A  .  (26) 

Cdi  Cd2  ■  ■  ■  Cdd_  wd  _  0  0  ■  ■  ■  Cdd _  wd_ 

Recall  that  our  structural  features  extracted  from  each  dimension  is  es¬ 
sentially  the  top  eigenvectors  of  the  utility  matrix  satisfying  (S^)TS^  =  I. 
Thus,  matrix  diag(Cn,  C22,  ■  ■  ■ ,  Cdd)  in  Eq.  (26)  becomes  an  identity  matrix. 
Hence  w  =  [wiT,  W2T,  •  •  • ,  wdT]T  corresponds  to  the  top  eigenvector  of  the 
full  covariance  matrix  on  the  left  side  of  Eq.  (26),  which  is  equivalent  to  PC  A 
applied  to  data  of  the  following  form: 

X=  (27) 

Suppose  the  SVD  of  X  is  X  =  U SVT,  then  w  corresponds  to  the  first  column 
of  V.  Thus  we  have 

iy  SW  Wi  =  3  \sW,SP\  •-•,5^1  w=  -XV!  =  2-Ui 

a  a  L  lad 

i= 1 

Since  cri /d  is  a  scalar,  XJ\  is  essentially  the  average  feature  values  of  each  actor 
after  we  aggregate  the  structural  features  of  different  dimensions  along  with 
the  transformation  w.  There  are  k  —  1  degrees  of  freedom  with  k  communities. 
To  compute  the  (fc  —  l)-dimension  embedding,  we  just  need  to  project  the  data 
X  onto  the  top  (k  —  1)  principal  vectors.  It  follows  that  the  top  (k  —  1)  vectors 
of  U  are  the  aggregated  structural  features. 


(24) 

(25) 
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Algorithm:  Structural  Feature  Integration 
Input:  Net  =  {j4(1),A(2V",Aw}, 
number  of  communities  k, 
number  of  structural  features  to  extract  t\ 

Output:  community  assignment  idx. 

1.  Compute  top  t  eigenvectors  of  the  utility  matrix  as  stated  in  Eq.  (14); 

2.  Compute  slim  SVD  of  X  =  [gW,  S(2),_-  •  •  S W]  =  UDVT ; 

3.  Obtain  lower-dimensional  embedding  U  =  U(\.  k  1); 

4.  Normalize  the  rows  of  U  to  unit  length; 

5.  Calculate  the  cluster  idx  with  k-means  on  U. 


Fig.  5:  Algorithm:  Structural  Feature  Integration  for  Multi-Dimensional  Networks 


The  detailed  structural  feature  integration  algorithm  is  summarized  in  Fig¬ 
ure  5.  In  summary,  we  first  extract  structural  features  from  each  dimension 
of  the  network  via  representative  community  detection  methods;  then  PCA 
is  applied  on  the  concatenated  data  as  in  Eq.  (27)  to  select  the  top  eigenvec¬ 
tors.  After  projecting  the  data  onto  the  principal  vectors,  we  obtain  a  lower¬ 
dimensional  embedding  which  captures  the  principal  pattern  across  all  the 
dimensions  of  the  network.  Then  we  can  perform  k-means  on  this  embedding 
to  find  out  the  discrete  community  assignment. 


3.4  Partition  Integration 

Partition  integration  takes  effect  after  the  community  partition  of  each  net¬ 
work  dimension  is  ready.  This  problem  has  been  studied  as  the  cluster  ensemble 
problem  [32],  which  combines  multiple  clustering  results  of  the  same  data  from 
a  variety  of  sources  into  a  single  consensus  clustering.  Strehl  and  Ghoph  [32] 
propose  three  effective  and  comparable  approaches:  cluster-based  similarity 
partitioning  algorithm  (CPSA) ,  HypergGraph  Partition  Algorithm  and  Meta- 
Clustering  Algorithm.  For  brevity,  we  only  present  the  basic  idea  of  CPSA 
here.  CPSA  constructs  a  similarity  matrix  from  each  clustering.  Two  objects’ 
similarity  is  1  if  they  belong  to  the  same  group,  0  if  they  belong  to  different 
groups.  Let  £  {0,  l}"xfe  denote  the  community  indicator  matrix  of  clus¬ 
tering  based  on  interactions  at  dimension  i.  The  similarity  between  nodes  can 
be  computed  as 

FT(1),7L(2),---,IL(<i) 

”  /-I  ~  i- 1 

Based  on  this  similarity  matrix  between  nodes,  we  can  apply  similarity-based 
community  detection  methods  we  introduced  before  to  find  out  clusters.  A 
disadvantage  of  this  CPSA  is  that  the  computed  similarity  matrix  can  be 
dense,  which  might  not  be  applicable  to  large  networks.  Instead,  we  can  treat 
H  as  the  feature  representation  of  actors  and  cluster  them  based  on  k-means 
directly.  Intuitively,  if  two  actors  are  assigned  to  the  same  group  in  the  majority 


i 

d  ■ 


Y  H{i\H(i))T  =  -  Y  HHT  where  H  = 
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of  dimensions,  they  would  share  features.  Thus,  the  two  actors  are  likely  to 
reside  within  the  same  community  in  the  final  consensus  cluster  as  well. 


3.5  Summary 

In  previous  subsections,  we  have  described  different  strategies  to  integrate 
multi-dimensional  network  information.  Here,  we  summarize  the  pros  and  cons 
of  different  schemes. 

Network  integration,  which  simply  averages  the  network  interactions  in 
different  dimensions,  can  be  problematic  if  the  interactions  are  not  comparable. 
In  reality,  actors  often  participate  in  different  dimensions  of  a  network  with 
varied  intensity.  Even  within  the  same  group,  the  interaction  can  be  very  sparse 
in  one  dimension  but  relatively  more  observable  in  another  dimension.  So  if 
there  is  one  dimension  with  intensive  interactions,  simply  averaging  all  the 
dimensions  would  overwhelm  the  structural  information  in  other  dimensions. 

Utility  integration  sums  up  all  the  utility  matrices.  This  combination  is 
consistent  with  the  overall  objective.  But  it  is  unclear  whether  the  utility 
function  is  directly  comparable  across  different  dimensions.  For  instance,  in 
modularity  maximization,  the  modularity  is  highly  relevant  to  the  density 
of  interactions  as  well  as  the  community  structure.  Is  the  average  of  utility 
function  a  reasonable  choice?  If  not,  how  can  we  normalize  it  so  that  the 
utility  in  different  dimensions  are  comparable.  This  will  be  studied  more  in 
the  empirical  study. 

Feature  integration  identifies  transformations  such  that  the  structural  fea¬ 
tures  of  different  dimensions  become  highly  correlated.  The  transformations 
map  structural  features  into  the  same  space,  thus  aggregation  is  viable.  Note 
that  the  extraction  of  structural  features  helps  reduce  the  noise  in  each  dimen¬ 
sion  of  the  network.  Hence,  feature  integration  is  expected  to  be  more  robust 
compared  with  other  methods. 

Partition  integration  relies  on  discrete  hard  clusterings.  Note  that  the  clas¬ 
sical  k-means  clustering  algorithm  normally  finds  a  local  optimal  and  is  highly 
sensitive  to  the  initial  condition.  Though  k-means  is  applied  to  all  the  schemes, 
partition  integration  apply  k-means  to  each  dimension  of  the  network  to  find 
out  the  partitions,  which  can  introduce  more  uncertainty,  hence  are  likely  to 
yield  results  with  relatively  high  variance. 


4  Empirical  Study 

We  now  discuss  evaluation  methods  that  are  suitable  for  multi-dimensional 
networks.  An  ideal  case  is  that  we  know  a  priori  community  memberships,  or 
so-called  ground  truth.  We  can  then  adopt  commonly  used  normalized  mutual 
information  (NMI)  [32].  Let  7ra,7Tb  denote  two  different  partitions  of  cornu- 
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Input:  Net  =  {AW,A<-2\--- 

a  multi-dimensional  integration  scheme  /; 

Output:  a  community  quality  measure  for  each  network  dimension. 

1.  for  p  =  1, . . . ,  d 

2.  hold  out  A for  testing; 

3.  obtain  community  structure  H  by  applying  /  to 

training  dimensions  •  •  • ,  A(p_1),  A(p+1),  •  •  • ,  A 

4.  compute  the  quality  measure  Q ^  based  on  H  and  A^p\ 

5.  end 


Fig.  6:  CDNV:  Cross-Dimension  Network  Validation 


nities.  NMI  is  defined  as 


NMI(na,irb) 


Eh=i  E/=i  los 


40>- 


(eE 


(a) 


1  ,lh 


los?)  (eE”»?’i°S  $ 


(28) 


where  n  is  the  total  number  of  data  instances,  k ^  and  k^  represent  the 
numbers  of  communities  in  partitions  7 ra  and  7rb  respectively,  n%,  nb  and  n^  j 
are,  respectively,  the  numbers  of  actors  in  the  ft.-th  community  of  partition  7r°, 
in  the  £-th  community  of  partition  nb,  and  in  both  the  h-ih  community  of  na 
and  £-th  community  of  irb.  NMI  is  a  measure  between  0  and  1.  NMI  is  equal 
to  1  when  na,  irb  are  equivalent. 

When  ground  truth  is  not  available,  an  alternative  evaluation  method  is 
needed  to  quantify  community  structures  extracted  employing  different  inte¬ 
gration  strategies.  If  a  latent  community  structure  is  shared  across  network 
dimensions,  we  can  perform  cross- dimension  network  validation  (CDNV)  as 
in  Figure  6.  Given  a  multi-dimensional  network  Net  =  {.AW|1  <  i  <  d},  we 
can  learn  a  community  structure  from  d  —  1  dimensions  of  the  network  and 
check  how  well  the  structure  matches  the  left-out  dimension  (El)p)).  In  other 
words,  we  use  d—1  dimensions  for  training  and  the  remaining  one  for  testing. 
During  the  training,  we  obtain  some  communities  ( C ),  and  use  C  to  calculate 
modularity  for  the  data  of  A ^  as  follows: 

Q=J—^2  ^2  Aij~didj/2m.  (29) 

171  c  iec,jec 


A  larger  modularity  implies  more  accurate  community  structure  is  discovered 
using  the  training  data. 

The  above  two  evaluation  methods  are  designed  for  different  contexts:  NMI 
is  suitable  for  data  with  known  ground  truth  and  CDNV  for  data  without.  If 
we  could  establish  their  relationship,  we  can  then  determine  if  CDNV  can 
be  used  to  compare  different  integration  strategies  for  community  detection. 
One  way  to  establish  the  relationship  between  NMI  and  CDNV  is  to  employ 
synthetic  data. 
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Fig.  7:  One  example  of  4-Dimensional  Network 


Fig.  8:  Performance  of  different  community  detection  methods.  Al,  A2,  A3  and  A4  show  the 
performance  on  a  single  dimension.  N,  U,  F,  and  P  denote  network,  utility,  feature,  partition 
integration,  respectively. 


To  recap,  we  use  Modularity  Maximization  to  produce  utility  matrix  M, 
compute  structural  features  S  via  spectral  analysis,  and  apply  k-means  to 
find  community  partitions5  (Figure  3).  Different  integration  strategies  can  be 
applied  at  various  stages  as  shown  in  Figure  4.  Baseline  strategies  are  to  not 
integrate  (d  —  1)  dimensions,  but  use  single  dimensions. 


4.1  Experiments  on  Synthetic  Data 

The  synthetic  data  has  3  groups,  each  having  50,  100,  200  members,  respec¬ 
tively.  There  are  4  dimensions  of  interactions  among  these  350  members.  For 
each  dimension  of  the  network,  we  sample  within-group  interaction  probability 
for  each  group  from  a  uniform  distribution.  Based  on  the  within-group  inter¬ 
action  probability,  interactions  occur  between  members  following  a  Bernoulli 
distribution.  Noise  is  also  added  by  randomly  connecting  two  nodes  in  the  net¬ 
work.  Since  we  have  the  group  membership  information  for  the  synthetic  data, 
NMI  (Eq.  (28))  can  be  employed  to  evaluate  the  performance  of  community 
evaluation. 


5  Since  k-means  clustering  is  sensitive  to  the  initialization,  we  repeat  k-means  5  times  and 
pick  whichever  is  the  best  as  the  community  partition. 
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Table  1:  Average  performance  over  100  runs.  NMI  denotes  the  average  performance  of 
comparing  the  extracted  communities  with  the  latent  group  membership  for  generating 
a  network,  Rnmi  the  ranking  comparing  different  strategies  based  on  NMI,  CDNV  the 
performance  based  on  cross-dimension  network  validation,  and  Rcdnv  the  ranking  based 
on  CDNV.  Note  that  NMI  and  CDNV  yield  consistent  rankings. 


Strategy 

NMI 

Rnmi 

CDNV 

Rcdnv 

Single-Dimensional 

- 

0.6903 

5 

0.1413 

5 

Network 

0.7946 

4 

0.1739 

4 

Multi-Dimensional 

Utility 

0.9157 

2 

0.2035 

2 

Integration 

Feature 

0.9351 

1 

0.2064 

1 

Partition 

0.8048 

3 

0.1785 

3 

Figure  7  shows  one  example  of  the  generated  multi-dimensional  network. 
Clearly,  different  dimensions  demonstrate  different  interaction  patterns.  Fig¬ 
ure  8  reports  the  performance  of  community  detection  in  terms  of  NMI,  where 
Al,  A2 ,  A?>  and  A4  denote  the  performance  based  on  a  single  dimension  (or 
baseline  strategies)  and  the  other  four  bars  show  the  performance  of  com¬ 
munity  detection  using  integration  strategies  corresponding  to  network  (N), 
utility  (U),  feature  (F)  and  partition  (P)  integration,  respectively.  Clearly, 
the  four  methods  which  integrate  information  from  different  network  dimen¬ 
sions  outperform  those  on  single-dimensional  networks.  This  could  be  easily 
explained  by  the  patterns  represented  in  Figure  7.  The  first  dimension  of  the 
network  actually  only  shows  two  groups,  and  the  second  dimension  involves 
only  one  group  with  the  other  two  hidden  behind  the  noise.  Thus,  using  a 
single  view  is  very  unlikely  to  recover  the  correct  latent  community  structure. 
This  is  indicated  by  the  low  NMI  of  the  first  two  dimensions.  Utilizing  the  in¬ 
formation  presented  in  all  the  dimensions,  on  the  contrary,  helps  compensate 
each  other  and  uncover  the  shared  community  structure. 

Comparing  different  integration  schemes,  feature  integration  in  this  case, 
uncovers  the  true  community  information  exactly,  whereas  the  others  do  not. 
Figure  8  shows  just  one  example.  To  conclude  more  confidently,  we  regener¬ 
ate  100  different  synthetic  data  sets  and  report  the  average  performance  of 
each  method  in  Table  1.  Clearly,  multi-dimensional  integration  schemes  out¬ 
perform  single-dimensional  community  detection  methods  in  terms  of  NMI. 
Structural  feature  integration  achieves  the  best  performance  with  lowest  vari¬ 
ance.  This  is  because  feature  integration  denoises  the  information  presented 
in  each  dimension,  thus  is  able  to  obtain  a  more  robust  clustering  result.  Net¬ 
work  integration  and  utility  integration,  on  the  other  hand,  combine  the  noisy 
network  or  utility  matrix  directly,  resulting  in  inferior  performance.  Partition 
integration  relies  on  partitions  extracted  from  each  network  dimension,  and 
partitions  depend  on  clustering  algorithm  being  used.  In  our  case,  k-means 
clustering  can  produce  local  optimal  partitions.  Structural  feature  integration 
is  the  most  effective  approach  among  all  of  them. 

In  order  to  verify  the  validness  of  cross-dimension  network  validation,  we 
hold  out  one  dimension  for  testing  and  pick  the  other  three  dimensions  for 
training.  The  average  performance  of  CDNV  is  also  shown  in  the  table.  It  is 
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Table  2:  The  density  of  each  dimension  in  the  constructed  5-dimensional  network 


Network 

Dimension 

Density 

ATI 

contact 

6.74  x  10-4 

A( 2) 

co-contact 

1.71  x  10“2 

A(  3) 

co-subscription 

4.90  x  10“2 

AW 

co-subscribed 

1.97  x  10“2 

A( 5) 

favorite 

3.34  x  10-2 

interesting  that  both  evaluation  schemes:  NMI  (with  latent  community  mem¬ 
bership  information)  and  CDNV  (without  true  community  membership  in¬ 
formation)  yield  consistent  rankings  when  comparing  different  strategies  for 
community  detection.  Next,  we  will  use  CDNV  to  verify  how  different  integra¬ 
tion  strategies  work  on  real-world  data. 


4.2  Experiments  on  Social  Media  Data 

We  now  compare  different  multi-dimensional  integration  strategies  using  YouTube 
data.  We  discuss  data  collection  and  properties  first  and  then  report  and  an¬ 
alyze  experimental  findings. 

4-2.1  YouTube  Data 

YouTube6  is  currently  the  most  popular  video  sharing  web  site.  It  is  reported 
to  “attract  100  million  video  views  per  day”7.  As  of  March  17th,  2008,  there 
have  been  78.3  million  videos  uploaded,  with  over  200,  000  videos  uploaded 
per  clay8.  This  social  networking  site  allows  users  to  interact  with  each  other  in 
various  forms  such  as  contacts,  subscriptions,  sharing  favorite  videos,  etc.  We 
use  YouTube  Data  API9  to  crawl  the  contacts  network,  subscription  network 
as  well  as  each  user’s  favorite  videos.  We  choose  100  authors  who  recently 
uploaded  videos  as  the  seed  set  for  crawling,  and  expand  the  network  via  their 
contacts  and  subscriptions.  We  obtain  a  small  portion  of  the  whole  network, 
with  30,522  user  profiles  reaching  in  total  848,003  contacts  and  1,299,642 
favorite  videos.  After  removing  those  users  who  decline  to  share  their  contact 
information,  we  have  15,  088  active  user  profiles  as  presented  in  three  different 
interactions:  two  adjacency  matrices  of  size  15,088  x  848,003  representing 
contact  relationship,  and  subscriptions  and  a  matrix  of  size  15,  088  x  1,  299, 642 
representing  users’  favorite  videos. 

One  issue  is  that  the  collected  subscription  network  is  directional  while 
most  community  detection  methods  such  as  block  models,  spectral  clustering 
and  modularity  maximization,  are  proposed  for  undirected  networks.  For  such 

6  http://www.youtube.com/ 

7  http:/ /www. usatoday.com/tech/news/2006-07-16-youtube- views_x.htm 

8  http: / /ksudigg.wetpamt.com/page/YouTube+Statistics?t=anon 

9  http:/ /code. google.com/apis/youtube/overview.html 
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Fig.  9:  Power  law  distribution  on  Different  Dimensions 

cases,  simply  ignoring  the  direction  can  confuse  the  two  roles  of  the  directional 
interaction.  Instead,  we  decompose  the  asymmetric  interaction  A  into  two 
unidirectional  interactions: 


A'  =  AAt ;  (30) 

A”  =  AtA.  (31) 

Essentially,  if  two  social  actors  both  subscribe  to  the  same  set  of  users,  it  is 
likely  that  they  are  similar  and  share  the  same  community;  On  the  other  hand, 
if  two  are  referred  by  the  same  set  of  actors,  their  similarity  tends  to  be  higher 

than  that  of  random  pairs.  This  is  similar  to  the  two  roles  of  hub  and  authority 

of  web  pages  as  mentioned  in  [19]. 

To  utilize  all  aspects  of  information  in  our  collected  data,  we  construct  a 
5-dimensional  network: 

A W  contact  network:  the  contact  network  among  those  15,088  active  users; 

A(A  co-contact  network:  two  active  users  are  connected  if  they  both  add  another 
user  as  contact;  This  is  constructed  based  on  all  the  reachable  848,003 
users  (excluding  those  active  ones)  in  our  collected  data  following  Eq.  (30). 

A®  co-subscription  network:  the  connection  between  two  users  denotes  they 
subscribe  to  the  same  user;  constructed  following  Eq.  (30); 

^(4) 

co-subscribed  network:  two  users  are  connected  if  they  are  both  subscribed 
by  the  same  user;  constructed  following  Eq.  (31); 

A^  favorite  network:  two  users  are  connected  if  they  share  favorite  videos. 

All  these  different  interactions  are  correlated  with  user  interests.  According 
to  homophily  effect  well  studied  in  social  science  [22],  people  tend  to  connect 
to  others  sharing  certain  similarities.  Thus,  we  expect  that  connected  friends 
in  the  contact  network  A^  is  more  likely  to  share  certain  interests.  Similarly, 
if  both  users  connect  to  another  user  or  a  favorite  video  (as  A^2\  A®  or  A^5)), 
they  are  likely  to  share  certain  interests.  On  the  other  hand,  if  two  users  are 
subscribed  by  the  same  set  of  users  (as  in  A^4)),  their  shared  content,  thus 
their  interests,  are  similar.  Essentially,  we  hope  to  extract  communities  share 
similar  interests  by  integrating  heterogeneous  interactions. 
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Table  3:  Performance  when  actors  are  clustered  into  20,  40,  and  60  communities,  respectively. 
In  the  table,  i?W(  1  <  i  <  5)  denotes  the  ranking  of  each  method  based  on  CDNV  as  using 
/l''i  for  testing,  and  Raverage  the  average  ranking  across  all  network  dimensions.  Bold 
entries  denote  the  best  in  each  column  for  each  case. 


o 

<N 

II 

Strategies 

RA) 

RW 

RW 

RA) 

r(5> 

Raverage 

— 

7 

8 

8 

8 

7.75 

Single-Dimensional 

a(  2> 

4 

— 

5 

5 

6 

5.00 

Community  Detection 

A(*> 

6 

5 

— 

4 

4 

4.75 

A(4) 

7 

4 

4 

— 

5 

5.00 

A(S) 

8 

6 

6 

6 

— 

6.50 

Network 

5 

8 

7 

7 

7 

6.80 

Multi-Dimensional 

Utility 

2 

2 

2 

2 

2 

2.00 

Integration 

Feature 

1 

1 

1 

1 

1 

1.00 

Partition 

3 

3 

3 

3 

3 

3.00 

o 

II 

Strategies 

RW 

r(2) 

Rt3) 

RW 

r(5> 

Raverage 

a m 

— 

8 

6 

7 

8 

7.75 

Single-Dimensional 

a(j) 

4 

— 

4 

5 

6 

4.75 

Community  Detection 

A(3) 

5 

4 

— 

4 

4 

4.25 

AW 

7 

6 

5 

— 

7 

6.25 

AW 

8 

7 

7 

6 

— 

7.00 

Network 

6 

5 

8 

8 

5 

6.40 

Multi-Dimensional 

Utility 

2 

2 

2 

3 

2 

2.20 

Integration 

Feature 

1 

1 

1 

2 

1 

1.20 

Partition 

3 

3 

3 

1 

3 

2.60 

k=60 

Strategies 

RW 

R(2) 

R( 3) 

RW 

R<5) 

Raverage 

A(b 

— 

5 

6 

7 

8 

6.50 

Single-Dimensional 

A(J) 

3 

— 

5 

4 

6 

4.50 

Community  Detection 

A(3) 

6 

6 

— 

5 

7 

6.00 

AW 

7 

4 

4 

— 

5 

5.00 

A(5) 

8 

8 

7 

6 

— 

7.25 

Network 

5 

7 

8 

8 

4 

6.40 

Multi-Dimensional 

Utility 

2 

2 

2 

1 

2 

1.80 

Integration 

Feature 

1 

1 

1 

2 

1 

1.20 

Partition 

4 

3 

3 

3 

3 

3.20 

Table  2  shows  the  connection  density  of  each  dimension.  Contact  dimension 
is  the  most  sparse  one,  while  the  other  dimensions,  due  to  the  construction, 
are  denser.  Figure  9  shows  the  degree  distribution  in  contacts  network  and 
favorite  network.  Both  follow  a  power  law  pattern  [8]  as  expected.  This  data 
set  is  publicly  available  at  the  first  author’s  homepage10. 

4-2.2  Comparative  Study 

The  four  multi-dimensional  integration  schemes  as  well  as  community  detec¬ 
tion  methods  on  a  single  dimension  are  compared.  We  cluster  actors  involved 
in  the  network  into  different  numbers  of  communities.  The  clustering  perfor¬ 
mance  of  single-dimensional  and  multi-dimensional  methods  when  k  =  20,  40 
and  60  are  presented  in  Table  3.  In  the  table,  rows  represent  methods  and 


10 


http : //www . public . asu . edu/~ltang9/heterogeneous_network . html 
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columns  denote  the  rankings  when  a  certain  network  dimension  is  used  as 
test  data.  The  bold  face  denotes  the  best  performance  in  each  column  for 
each  case.  Note  that  in  our  cross-dimension  network  validation  procedure,  the 
test  dimension  is  not  available  during  training,  thus  the  diagonal  entries  for 
single-dimensional  methods  are  not  shown. 

Feature  integration  is  clearly  the  winner  most  of  the  time,  except  for  certain 
rare  cases  (e.g.,  using  A ^  as  the  test  dimension  when  k  =  40  or  60).  We  notice 
that  the  rankings  of  different  integration  strategies  do  not  change  much  with 
different  k.  A  closer  examination  reveals  that  utilizing  information  of  all  the 
dimensions  (except  network  integration)  outperforms  single-dimensional  clus¬ 
tering.  Network  integration  does  not  work  well,  because  the  network  studied 
here  are  weighted  and  simple  average  blurs  the  latent  community  structure 
information  presented  in  each  dimension.  In  terms  of  performance  ranking, 
feature  integration  -<  utility  integration  -<  partition  integration  -<  network  in¬ 
tegration.  Feature  integration,  by  removing  noise  in  each  dimension,  yields  the 
most  accurate  community  structure  among  all  the  methods. 


5  Further  Analysis 

In  the  previous  section,  we  have  demonstrated  that  feature  integration  tends 
to  outperform  other  integration  schemes.  In  this  section,  we  perform  further 
analysis  concerning  the  computational  time  of  different  methods,  normaliza¬ 
tion  effect  during  integration,  sensitivity  to  related  parameters,  and  alternative 
community  detection  methods  for  integration. 


5.1  Efficiency  Study 

The  four  multi-dimensional  integration  schemes  differ  drastically  over  time 
complexity.  Table  4  summarizes  the  asymptomatic  time  complexity  of  differ¬ 
ent  methods.  Clearly,  network  integration  and  utility  integration  are  the  most 
efficient,  which  require  the  average  of  network  matrix  or  utility  matrix  with 
time  complexity  O(dm).  Following  that,  one  instance  of  eigenvector  compu¬ 
tation  and  k-means  clustering  are  required.  Feature  integration  and  partition 
integration  require  the  computation  of  structural  features  in  each  network  di¬ 
mension.  Note  that  this  can  be  accelerated  via  parallel  computing.  Feature 
integration  needs  to  compute  the  SVD  of  a  dense  matrix  X  (Eq.  (27))  of  size 
n  x  <M,  which  costs  0(nd£  ■  min(n,  dt}).  Since  d  «  n  and  i  «  n.  The  ad¬ 
ditional  computational  cost  is  still  acceptable.  Partition  integration,  without 
SVD,  requires  many  more  runs  of  k-means  clustering,  but  without  the  SVD 
computation  for  integration.  Since  the  major  computational  cost  is  associated 
with  the  eigenvector  problem,  feature  integration  and  partition  integration  are 
expected  to  take  more  time. 

Figure  10  and  Figure  11  show  the  computational  time  of  modularity  max¬ 
imization  with  respect  to  a  variety  of  community  numbers  and  network  sizes. 
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Table  4:  Time  complexity  of  different  integration  strategies.  Integration  Cost  denotes  the 
additional  cost  to  perform  the  integration.  #Tutmty,  WTeig->  and  #Tfcmeans  denote  the 
required  number  of  utility  matrix  construction,  eigenvector  computation,  and  kmeans  clus- 
tering,  respectively.  Tutuity ,  Teig  and  Tkmeans  are  specified  in  Eqs.  (15),  (16)  and  (17). 


Integration  Scheme 

Integration  Cost 

4¥TutiHty 

W^eig 

4 

Network  Integration 

O(dm) 

i 

l 

i 

Utility  Integration 

O(dm) 

d 

l 

i 

Feature  Integration 

0(ndl  •  min(n,  dl)) 

d 

d 

i 

Partition  Integration 

^kmeans 

d 

d 

d 

Fig.  10:  Computation  time  with  respect 
to  varying  number  of  communities  on 
YouTube  of  15,  088  nodes 


Fig.  11:  Computation  time  with  respect 
to  varying  network  size  (the  number  of 
communities  is  fixed  to  40) 


In  both  figures,  feature  integration  and  partition  integration  are  comparable, 
which  is  consistent  with  our  analysis.  By  contrast,  network  integration  and 
utility  integration  need  to  compute  the  eigenvector  of  only  one  utility  matrix, 
thus  it  is  more  efficient.  However,  as  we  have  demonstrated  in  the  previous 
section,  the  performance  of  these  two  strategies  is  not  comparable  to  feature 
integration.  Note  that  in  Table  4,  we  only  show  the  asymptotic  computation 
time.  In  reality,  the  network  density  can  also  affect  the  computational  cost. 
It  is  observed  when  the  number  of  clusters  is  huge,  the  computation  time  of 
integrated  network  even  takes  more  time  than  feature  integration  (100  com¬ 
munities  in  Figure  10).  As  shown  in  Figure  11,  the  computational  time  scales 
linearly  with  respect  to  network  size,  promising  for  applications  to  large-scale 
networks.  In  summary,  if  efficiency  is  the  major  concern,  we  recommend  utility 
integration,  which  is  fastest,  and  is  only  second  in  performance  to  the  opti¬ 
mal  integration  scheme.  Otherwise,  feature  integration,  though  with  additional 
computational  cost,  should  be  selected. 


5.2  Normalization  Effect 

In  the  previous  section,  we  showed  that  network  integration  and  utility  integra¬ 
tion  is  not  comparable  to  feature  integration.  One  conjecture  is  that  whether 
or  not  we  can  find  an  effective  normalization  or  weighting  scheme  such  that 
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Fig.  12:  Performance  with  Normalization  (Weighting)  Schemes 


they  can  be  integrated  more  effectively.  Here,  as  an  attempt,  we  try  some 
straightforward  schemes  and  show  the  effect. 

For  network  integration,  a  natural  solution  is  to  normalize  the  interaction 
by  the  total  number  of  interactions.  Specifically,  we  have  the  following  weighted 
network  integration: 

^  =  E  ^ormalized  =  x;  ^V(2m«)  (32) 

i—  1  i=l 

Essentially,  after  this  normalization,  the  total  weight  of  the  interaction  be¬ 
comes  1  in  each  dimension. 

As  for  utility  integration,  one  hypothesis  is  to  use  the  community  strength 
in  each  dimension  as  a  guide  to  do  the  weighted  average.  If  one  dimension’s 
community  structure  is  more  prominent,  it  seems  reasonable  to  trust  that 
dimension  more.  Let  denote  the  modularity  computed  in  dimension  i.  We 
integrate  the  utility  matrix  in  a  weighted  fashion  as  follows: 

d 

M  =  (33) 

i=l 

Due  to  the  space  limit,  we  only  show  the  performance  on  the  contact  di¬ 
mension  of  the  YouTube  network  in  Figure  12.  The  attempt  of  normalizing 
network  interactions  helps  most  of  the  time,  and  utility  weighting  shows  com¬ 
parable  performance  to  simple  average  of  utility.  It  seems  assigning  different 
weights  to  the  dimensions  requires  more  insightful  understanding  upon  the 
dimensions.  After  all,  the  performance  of  network  integration  and  utility  in¬ 
tegration  after  normalization  and  weighting  is  still  not  comparable  to  feature 
integration. 


5.3  Sensitivity  of  Feature  Integration 

In  feature  integration,  one  parameter  is  the  number  of  structural  features  to 
extract  (£  in  Figure  5).  In  this  part,  we  study  the  performance  sensitivity 
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Fig.  13:  Sensitivity  to  Number  of  Structural  Features 


of  feature  integration  with  respect  to  this  parameter.  We  vary  the  number  of 
structural  features  from  10  to  500  and  show  the  performance  variations  in  Fig¬ 
ure  13.  The  performance  stabilizes  when  reasonable  large  number  of  structural 
features  (say,  >  150)  are  extracted.  As  long  as  there  are  enough  structural  fea¬ 
tures,  the  performance  is  reasonably  good.  That  is,  feature  integration  is  not 
sensitive  to  the  parameter  in  a  large  range.  In  practice,  we  can  start  from  a 
reasonably  large  number,  and  exploit  cross-dimension  network  validation  to 
select  a  proper  parameter. 


5.4  Alternative  Community  Detection  Methods 

Note  that  this  work  presents  a  general  framework  to  integrate  information  of 
heterogeneous  interactions.  Previously,  we  showed  the  result  based  on  mod¬ 
ularity  maximization.  The  same  integration  schemes  can  be  applied  to  other 
community  detection  methods  as  well,  such  as  block  model  approximation 
and  spectral  clustering.  The  latent  space  model  is  not  included  here  due  to 
its  high  computational  cost  as  discussed  in  Section  2.5.  We  can  simply  replace 
the  utility  matrix  with  network  interaction  or  graph  Laplacian  as  specified  in 
Eq.  (14).  One  interesting  question  is  which  community  detection  method  is 
the  best? 

Here,  we  combine  the  feature  integration  strategy  with  block  model  ap¬ 
proximation,  spectral  clustering  and  modularity  maximization,  respectively. 
The  resultant  performance  on  the  contact  dimension  of  YouTube  network  is 
plotted  in  Figure  14.  Spectral  clustering  is  consistently  better  than  modularity 
maximization  and  block  model  approximation.  This  result  is  consistent  with 
that  as  reported  in  [42]. 

Figure  15  shows  the  performance  of  four  different  integration  schemes  with 
spectral  clustering.  Clearly,  feature  integration,  similar  to  the  case  of  modular¬ 
ity  maximization,  is  the  winner.  Note  that  our  integration  scheme  is  indepen¬ 
dent  of  the  community  detection  method.  With  a  proper  constructed  utility 
matrix,  we  might  achieve  better  performance. 
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Fig.  14:  Performance  of  Different  Community  Detection  Methods 


Fig.  15:  Performance  of  Integration  Schemes  with  Spectral  Clustering 


6  Related  Work 

Multi-dimensional  networks  (or  multiple  networks  constructed  from  disparate 
sources)  appear  in  many  web  applications.  Meza  et  al.  [1]  constructs  a  seman¬ 
tic  web  to  detect  the  Conflict  of  Interest  relationship  among  paper  reviewers 
and  authors.  Two  social  networks  FOAF  (Friend-of-a-Friend)  and  DBLP  (co¬ 
author)  networks  are  integrated  in  terms  of  attribute  similarity  of  persons. 
Jung  et  al.  [16]  construct  a  semantic  social  network  which  includes  a  social 
network,  an  ontology  network  and  a  concept  network.  It  is  shown  that  relation¬ 
ships  in  one  network  might  be  inferred  from  another.  Jin  et  al.  [44]  study  the 
entity  ranking  problem  in  social  networks.  The  authors  first  extract  different 
relations  between  entities  and  construct  heterogeneous  social  networks.  Then 
they  integrate  the  constructed  networks  with  different  weighting  methods  for 
more  accurate  ranking.  Zhou  et  al.  [46]  recommend  documents  in  a  digital  li¬ 
brary  by  integrating  a  citation  network  and  networks  of  documents  and  other 
related  entities. 

Some  work  attempts  to  address  unsupervised  learning  with  multiple  data 
sources  or  clustering  results,  such  as  cluster  ensemble  [32,38,11]  and  consensus 
clustering  [24,15,27,12].  These  methods  essentially  fall  into  partition  integra¬ 
tion  scheme  presented  in  our  framework.  Most  of  the  algorithms  aim  to  find  a 
robust  clustering  based  on  multiple  clustering  results,  which  are  prepared  via 
feature  or  instance  sampling  or  disparate  clustering  algorithms.  A  similar  idea 
is  applied  to  community  detection  in  social  networks  [13].  A  small  portion  of 
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connections  between  nodes  are  randomly  removed  before  each  run,  leading  to 
multiple  different  clustering  results.  Those  clusters  occurring  repeatedly  are 
considered  more  stable,  and  are  deemed  to  reflect  the  natural  communities 
in  reality.  However,  all  the  cluster  ensemble  methods  concentrate  on  either 
attribute-based  data  or  one-dimensional  networks. 

Another  related  field  is  multi- view  clustering.  Bickel  and  Scheffere  [5]  pro¬ 
pose  co-EM  and  an  extension  of  k-means  and  hierarchical  clustering  to  handle 
data  with  two  conditional  independent  views.  Sa  [9]  creates  a  bipartite  based 
on  the  two  views  and  tries  to  minimize  the  disagreement.  Different  spectral 
frameworks  with  multiple  views  are  studied  in  [45]  and  [21].  The  former  defines 
a  weighted  mixture  of  random  walk  over  each  view  to  identify  communities. 
The  latter  assumes  clustering  membership  of  each  view  is  provided  and  finds  an 
optimal  community  pattern  via  minimizing  the  divergence  of  the  transformed 
optimal  pattern  and  the  community  membership  of  each  view.  A  variant  of 
utility  integration  based  on  block  model  approximation  plus  regularization  is 
presented  in  [37].  Similarly,  [2]  suggests  combining  graph  Laplacians  for  semi- 
supervised  learning.  It  is  empirically  verified  that  our  proposed  integration 
schemes  also  apply  to  spectral  clustering  and  block  model  approximation,  and 
feature  integration  tends  to  be  the  most  robust  one. 

Unsupervised  multiple  kernel  learning  [39]  is  relevant  to  network  integra¬ 
tion  if  we  deem  each  dimension  of  the  network  as  a  similarity  or  kernel  matrix. 
Multiple  kernel  learning  aims  to  find  a  combination  of  kernels  to  optimize  for 
classification  or  clustering.  Unfortunately,  its  limited  scalability  hinders  its 
application  even  to  a  medium-size  network. 

Some  theoretical  analysis  of  multi- view  clustering  via  canonical  correlation 
analysis  is  presented  in  [7].  It  shows  that  under  the  assumption  that  the  views 
are  uncorrelated  given  the  cluster  label,  a  much  weaker  condition  is  required 
for  CCA  to  separate  clusters  successfully.  However,  the  conclusion  is  based  on 
two  views  with  each  being  attributes.  It  requires  further  research  to  generalize 
the  theoretical  result  to  networks  of  multiple  heterogeneous  interactions. 


7  Conclusions  and  Future  Work 

Multi-dimensional  networks  commonly  exist  in  many  social  networking  sites, 
reflecting  diverse  individual  activities.  In  this  work,  we  propose  and  discuss  dif¬ 
ferent  strategies  to  detect  the  latent  communal  structure  in  a  multi-dimensional 
network.  We  formally  describe  the  community  detection  problem  in  multi¬ 
dimensional  networks  and  present  a  framework  of  different  integration  schemes 
to  handle  the  problem.  We  show  that  representative  community  detection 
methods  such  as  latent  space  models,  block  model  approximation,  spectral 
clustering,  and  modularity  maximization,  can  be  presented  in  a  unified  view 
involving  four  components:  network  interactions,  utility  matrix,  structural  fea¬ 
tures  and  community  partitions.  In  this  way,  we  can  integrate  the  information 
presented  in  different  network  dimensions  in  terms  of  each  component,  leading 
to  four  different  integration  schemes:  network  integration,  utility  integration, 
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feature  integration  and  partition  integration.  We  systematically  study  these 
different  integration  schemes  and  show  that  feature  integration,  which  extracts 
structural  features  from  each  dimension  of  a  multi-dimensional  network  and 
integrate  them  via  principal  component  analysis,  outperforms  other  integra¬ 
tion  schemes. 

As  we  have  shown  in  the  empirical  study,  utility  integration  is  efficient 
when  compared  with  feature  integration.  However,  its  performance  depends 
on  a  clever  weighting  scheme  over  each  dimension.  It  is  intriguing  to  find  an 
effective  scheme  that  can  boost  the  performance  of  utility  integration  as  com¬ 
parable  to  feature  integration  while  maintaining  efficiency.  In  our  current  work, 
we  assume  that  heterogeneous  interactions  share  the  same  community  struc¬ 
ture.  When  community  structures  vary  significantly  in  subsets  of  dimensions, 
new  research  questions  arise.  Can  we  automatically  determine  which  dimen¬ 
sions  share  the  same  community  structure?  How  are  they  correlated?  By  cross¬ 
dimension  network  validation,  we  might  be  able  to  calibrate  the  correlation 
between  different  network  dimensions.  However,  this  problem  becomes  com¬ 
plicated  if  some  communities  are  shared  across  different  dimensions  whereas 
others  are  not.  Further  research  is  required  in  this  area.  It  would  also  be  inter¬ 
esting  to  extend  the  integration  strategies  to  handle  overlapping  communities 
to  construct  a  semantic  ontology  from  tag  networks.  We  expect  that  more  re¬ 
search  on  community  detection  in  multi-dimensional  networks  will  emerge  in 
the  near  future. 
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